Goto

Collaborating Authors

 reference graph



KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs

Hofer, Marvin, Rahm, Erhard

arXiv.org Artificial Intelligence

Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.


Practical Causal Evaluation Metrics for Biological Networks

Sato, Noriaki, Scutari, Marco, Kawano, Shuichi, Yamaguchi, Rui, Imoto, Seiya

arXiv.org Artificial Intelligence

Estimating causal networks from biological data is a critical step in systems biology. When evaluating the inferred network, assessing the networks based on their intervention effects is particularly important for downstream probabilistic reasoning and the identification of potential drug targets. In the context of gene regulatory network inference, biological databases are often used as reference sources. These databases typically describe relationships in a qualitative rather than quantitative manner. However, few evaluation metrics have been developed that take this qualitative nature into account. To address this, we developed a metric, the sign-augmented Structural Intervention Distance (sSID), and a weighted sSID that incorporates the net effects of the intervention. Through simulations and analyses of real transcriptomic datasets, we found that our proposed metrics could identify a different algorithm as optimal compared to conventional metrics, and the network selected by sSID had a superior performance in the classification task of clinical covariates using transcriptomic data. This suggests that sSID can distinguish networks that are structurally correct but functionally incorrect, highlighting its potential as a more biologically meaningful and practical evaluation metric.



Appendix: Permutation-Invariant V ariational Autoencoder for Graph-Level Representation Learning

Neural Information Processing Systems

Since we apply the row-wise softmax in Eq. (7), Each self attention layer was followed by a point-wise fully connected neural network with two layers (1024 hidden dim) and a residual connection. We set the graph embedding dimension to 64. We tried different weightings of reconstruction and permutation matrix penalty loss to maximize the reconstruction accuracy with a discretized permutation matrix, while enabling stable training. In section 4.1 we describe how distances in the graph embedding space One important property of the GED is its invariance to the node ordering of graphs that are compared. As discussed in section 2.2 (Key architectural properties), we carefully This is exactly what we would expect.


It Takes a Graph to Know a Graph: Rewiring for Homophily with a Reference Graph

Mendelman, Harel, Maron, Haggai, Talmon, Ronen

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) excel at analyzing graph-structured data but struggle on heterophilic graphs, where connected nodes often belong to different classes. While this challenge is commonly addressed with specialized GNN architectures, graph rewiring remains an underexplored strategy in this context. We provide theoretical foundations linking edge homophily, GNN embedding smoothness, and node classification performance, motivating the need to enhance homophily. Building on this insight, we introduce a rewiring framework that increases graph homophily using a reference graph, with theoretical guarantees on the homophily of the rewired graph. To broaden applicability, we propose a label-driven diffusion approach for constructing a homophilic reference graph from node features and training labels. Through extensive simulations, we analyze how the homophily of both the original and reference graphs influences the rewired graph homophily and downstream GNN performance. We evaluate our method on 11 real-world heterophilic datasets and show that it outperforms existing rewiring techniques and specialized GNNs for heterophilic graphs, achieving improved node classification accuracy while remaining efficient and scalable to large graphs.


Synthesizing Diverse Network Flow Datasets with Scalable Dynamic Multigraph Generation

Grayeli, Arya, Swarup, Vipin, Noel, Steven E.

arXiv.org Artificial Intelligence

Obtaining real-world network datasets is often challenging because of privacy, security, and computational constraints. In the absence of such datasets, graph generative models become essential tools for creating synthetic datasets. In this paper, we introduce a novel machine learning model for generating high-fidelity synthetic network flow datasets that are representative of real-world networks. Our approach involves the generation of dynamic multigraphs using a stochastic Kronecker graph generator for structure generation and a tabular generative adversarial network for feature generation. We further employ an XGBoost (eXtreme Gradient Boosting) model for graph alignment, ensuring accurate overlay of features onto the generated graph structure. We evaluate our model using new metrics that assess both the accuracy and diversity of the synthetic graphs. Our results demonstrate improvements in accuracy over previous large-scale graph generation methods while maintaining similar efficiency. We also explore the trade-off between accuracy and diversity in synthetic graph dataset creation, a topic not extensively covered in related works. Our contributions include the synthesis and evaluation of large real-world netflow datasets and the definition of new metrics for evaluating synthetic graph generative models.


A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition

Li, Yuanpeng

arXiv.org Artificial Intelligence

Compositional generalization is a crucial property in artificial intelligence, enabling models to handle novel combinations of known components. While most deep learning models lack this capability, certain models succeed in specific tasks, suggesting the existence of governing conditions. This paper derives a necessary and sufficient condition for compositional generalization in neural networks. Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training. The condition is supported by mathematical proofs. This criterion combines aspects of architecture design, regularization, and training data properties. A carefully designed minimal example illustrates an intuitive understanding of the condition. We also discuss the potential of the condition for assessing compositional generalization before training. This work is a fundamental theoretical study of compositional generalization in neural networks.


Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

Yuan, Hanyang, Xu, Jiarong, Huang, Renhong, Song, Mingli, Wang, Chunping, Yang, Yang

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) have attracted considerable attention due to their diverse applications. However, the scarcity and quality limitations of graph data present challenges to their training process in practical settings. To facilitate the development of effective GNNs, companies and researchers often seek external collaboration. Yet, directly sharing data raises privacy concerns, motivating data owners to train GNNs on their private graphs and share the trained models. Unfortunately, these models may still inadvertently disclose sensitive properties of their training graphs (e.g., average default rate in a transaction network), leading to severe consequences for data owners. In this work, we study graph property inference attack to identify the risk of sensitive property information leakage from shared models. Existing approaches typically train numerous shadow models for developing such attack, which is computationally intensive and impractical. To address this issue, we propose an efficient graph property inference attack by leveraging model approximation techniques. Our method only requires training a small set of models on graphs, while generating a sufficient number of approximated shadow models for attacks. To enhance diversity while reducing errors in the approximated models, we apply edit distance to quantify the diversity within a group of approximated models and introduce a theoretically guaranteed criterion to evaluate each model's error. Subsequently, we propose a novel selection mechanism to ensure that the retained approximated models achieve high diversity and low error. Extensive experiments across six real-world scenarios demonstrate our method's substantial improvement, with average increases of 2.7% in attack accuracy and 4.1% in ROC-AUC, while being 6.5$\times$ faster compared to the best baseline.


Self-Supervised Path Planning in UAV-aided Wireless Networks based on Active Inference

Krayani, Ali, Khan, Khalid, Marcenaro, Lucio, Marchese, Mario, Regazzoni, Carlo

arXiv.org Artificial Intelligence

Secondly, we use the learned This paper presents a novel self-supervised path-planning method world model as an internal generative model enriched with active for UAV-aided networks. First, we employed an optimizer to solve states to simulate the environment and plan actions that minimize training examples offline and then used the resulting solutions as the agent's surprise during online decision-making. This approach demonstrations from which the UAV can learn the world model to enables the UAV to navigate its surroundings with a reference model understand the environment and implicitly discover the optimizer's representing the goal, choosing actions that minimize unexpected or policy. UAV equipped with the world model can make real-time unusual observations (surprise) measured by how much they deviate autonomous decisions and engage in online planning using active from the expected goal. The main contributions of this paper are as inference. During planning, UAV can score different policies based follows: It expands on previous research [11] by exploring online on the expected surprise, allowing it to choose among alternative planning, a prospective form of cognition.